A Post by Michael B. Spring

(A list of all posts by M.B. Spring)

Academic Life in the Green Zone (December 20, 2009)

When an organization is concerned about the security of its digital assets, it creates a firewall to protect them. It is a rather poor analogy to a building or car firewall, but it will do. In theory the private network , devices, and data are safe inside the firewall. Again, also as a simplification, the firewall is generally considered as a system that prevents outsiders from coming in but allows insiders to venture out. (Some organizations also restrict outgoing traffic, e.g. to pornographic sites.) When some assets of the protected organization need to have exposure to the outside world, such as a website we want people to see, we place that asset in a “DMZ” or demilitarized zone. We feed it from the protected zone and take great care to protect it and still greater care to make sure that any damage that might be done to it is contained and easily repairable. This is a rather simplistic, but complete picture of many organizations today. Our firewall surrounds us and protects us from the “hostile environment” that the internet has become for many organizations. The exposed part of our digital presence is placed in a heavily guarded demilitarized zone. We live in a heavily guarded Green Zone. (On days when or protectors seem both overly aggressive and lacking in common sense, I am more likely to think of the Green Zone as the Militarized Zone.)

There is a sense of déjà vu in this situation, and the parallels are not quite perfect. Prior to 1980, computing devices were expensive and shared. Specialized staff maintained them in secure zones. When I attended NYU in the late sixties, “regular” computer science students were not even allowed on the floor of the Courant Institute that housed the mainframe. Acolytes carried our trays of cards up to the second floor where Deacons put them in the job queue. The results of our efforts were returned as a computer printout the next day. What could be done with the time shared computers of the 1960s and 1970s was a matter of negotiation with the gods of the computer center. They chose the hardware, operating system, software, etc. They allocated precious disk space and job priorities. The resources allocated to you were allocated after consideration of the needs of the organization as a whole. Violate the code of conduct and you could be shut down. In the 80’s with the advent of PC’s, inexpensive mini computers, and specialized workstations, the world began to change. In academia, the development of networks and networks of networks – the internet began to change the very fabric of computing. First, each of us controlled and determined appropriate use for our own machines. Second, devices historically dedicated to computation began to be used for communication. With the introduction of the protocols for simple distributed information sharing – http or as we refer to the resulting structure, the World Wide Web emerged to challenge our entire vision of information processing. The computational fabric was moving from the world of data to the world of information – and would eventually begin to make inroads into the world of knowledge. The creation, dissemination, and use of information began to change. A new universe was here to replace the Gutenberg galaxy. But something else was changing as well, with the popularization the internet, email, and the web, business realized that a part of its business was also related to information. With digital music and photographs, high speed connections to the home and businesses seeing a new marketing channel, it was not long before the information highway was also seen as a financial marketplace. When the money started to flow, criminals moved in. And we found the need to create safe walled communities – green zones. With that overly long, but hopefully reasonably accurate description of 50 years of computing, we can turn to asking how academic life is impacted in the green zone.

Academic life involves three kinds of activities, instruction, research, and service. I reflect here on the first two, although the third is undoubtedly impacted as well. I happen to teach things like “web standards and technology”, “e-business”, “e-business security”, “web services and distributed computing”. My research is focused on social networking systems and web-delivered medical interventions.

In teaching, I need to help students to develop skills related building distributed systems that link organizations across the internet. For example:

Students I teach write programs more than they write papers. They write programs in Java, javascript, perl, c, etc. and mail them to me to be graded. Those of you who have been subject to email viruses know that one way hackers attack machines is be sending mail attachments that are malicious programs. So what do our protectors do? They delete anything that might be malicious content from incoming emails—and for most users they do an amazing job of eliminating only the malicious stuff. Guess what happens to student homework? We find more and more clever ways to work around the restrictions, and our protectors find better and better ways to protect us. Many of my colleagues use gmail to avoid these issues. Every time I hear this, I cringe as I think about the confidentiality of faculty student communication when we are forced to move it into a public space because our private space is protecting us too much.
My students should gain expertise in the development of programmatic elements of web services. Let me put it more simply. They need to learn how to process a shopping cart to process an online order, or to build an online help desk with chat facilities. This is done by writing programs that control the web interaction. Each student at the university is given facilities, controlled in our green zone, to build websites and see how they work. But because our institution is legitimately concerned about abuses, those websites are limited to the most simple forms – without programmatic capabilities. That is to say, students are allowed to put up static web pages such as was reasonable 15 years ago, but which are horribly antiquated by today’s standards. I am allowed to run my own servers, but there existence is frowned upon and viewed as a liability.
A business wants to know how its website is being used by clients. The design of a website should reflect a detailed understanding of how it is used. Fortunately, the design of web servers is such that they can be set up to keep detailed logs of the interactions that take place. These logs can be analyzed to see who is visiting us, what they are looking for, where they get lost, how they get from place to place, etc. It is useful to provide these detailed logs to students and to have them conduct analyses and make recommendations for improvement. Businesses do this on a regular basis. Unfortunately, the logs kept by the university on its website have been defined as private information and are sealed. Students can not look at it an assess it. (Indeed, I am not sure that the University officials who should be concerned about how it is used bother to ask for this data for analysis.)
One of the hottest topics today is social networking sites – like LinkedIn, FaceBook, and Flickr. These sites allow people to volunteer information for public exposure. Our students use these sites and our students, at least some of them, will be asked to build these sites. Perhaps the most exciting academic challenge in the design of social networking systems is the back end analysis of the data. For example, we can examine statistical correlations among users of the system, the contributions they make, and the patterns of their behavior. How do we conduct that kind of analysis. We mount a public site that provides some desirable service – such as recommendations for restaurants or events. We then allow people to use it and watch what they do. This presents a risk, and we have all read about inappropriate behavior on these sites. For this reason, the University frowns upon the development of social networking sites that are publicly accessible. Liability understood, how do we teach the next generation of developers to build better sites. We could do it in a totally controlled environment where only our students could access the site, but making it accessible to 20 students doesn’t teach us how to control it when it is being accessed by millions of people.

Research can also suffer in the green zone. The issues are a little different and a few examples will serve to illuminate the scope of the problem.

Consider the development of a website that is intended for research on whether certain kinds of psychoeducational intervention will reduce symptoms and improve quality of life for persons with severe mental illness. The university, and the researcher would like the website to be secure and the data collected to be protected. This is reasonable, and required by IRB regulations and HIPPA law. To insure my server is monitored, the University would be happiest if it was built on their protected platforms, but the kinds of programmatic tools that support the website are not supported by the University system. So, we can develop the website if they don’t protect us, but that is not good. They can protect us if we don’t build the website, but that is a non starter.
By the way, how will we measure the use of a website by the subject and correlate it with the change in symptoms or quality of life? The answer is that we will register the users and track their use of the website. Remember those logs I talked about earlier. It is possible to add information about the subjects to the logs which allow us to track who read which page or read which bulletin board posting. There are a couple problems here. The University does a wonderful job of controlling access to protected websites for its members, but they have no category for subject in a research study – they appear and disappear too fast. In addition, if they kept this information in logs, which I don’t believe they do, they would not allow us access to them. Thus, we would not be able to do the hours of analysis that we now do to show that the web based intervention was directly related to changes in symptoms.
Research is not done in isolation. We have collaborators with whom we work at other institutions. It is not uncommon to want to provide access to data between institutions in an automated fashion. That is, machine A makes a connection to machine B and checks to see if new data has been collected by B. If so, it is transferred from B to A. This is done securely by what is called a “cron job” or “scheduled task” operating over an SSL(Secure Socket Layer) connection using public private key authentication. Trust me, it makes good sense and happens all the time. But we have disabled these functions for the users of the University system because they present a possible security loophole. We find ourselves in between another rock and a hard place.
One final example. For a variety of reasons, like HIPPA compliance and IRB requirements, we can’t send information about subject activity to researcher via email. (Actually, we could but getting researchers to use secure email is very difficult.) As an alternative, we can ask the researchers to check their secure website every ten minutes to check and see if anything has happened that requires their attention, but that doesn’t work either. One solution we have found is that when something happens on the website we can send a very simple email to the researcher, unsecured, that says their attention is needs. No private information. Just a note to say they are needed. They can then securely connect to the site and take care of business – an optimal use of everyone’s time. Recently, the University improved the security and quality of the email services by requiring the user to login to send mail. This protects us from program that spew unauthorized spam mail. Unfortunately, it also blocks our legitimate notifications.

Having drawn a picture of some of the drawbacks of academic life in a green zone, let me make clear that I like life in a green zone. It makes many of the things I want to do easier. It provides me with 24x7 monitors for my systems, assures better power and internet connections, and gives me reliable automatic backup. Further, when we encounter most of the problems mentioned above, it is possible by getting to senior technical people to correct the problem. For example, when I raised the issue of automated email notifications, I was told that I could have my server “white listed”, i.e. by providing the address of the machine that would be sending notifications in advance, the controlling personnel would make sure the mail was still accepted.

The issue, and the point of this blog entry, is that just as we had to develop a better form of communication between users and the “high priests” of centralized systems in the 70’s so today we need to provide better communications between the denizens of the academic community and the technical staff patrolling and protecting the green zone. The academic life is about exploring and developing the new world and sometimes this exploration can be stymied when those protecting us don’t understand what we are trying to do.